suppress warning message of pandas_on_spark to_spark #1058

thinkall · 2023-05-30T03:40:03Z

Why are these changes needed?

To suppress warning message of pandas_on_spark to_spark like below:

PandasAPIOnSparkAdviceWarning: If `index_col` is not specified for `to_spark`, the existing index is lost when converting to Spark DataFrame.

The warning message could appear many times during the AutoML/tuning process, which is annoying and unnecessary.

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

levscaut

the suppress is validated in our customize build of FLAML, which could greatly reduce redundant and unformatted information from console.

sonichi

The solution is OK. Just a reminder that these functions are on the critical path and the import statement inside them can slow down the performance.

thinkall · 2023-05-31T10:19:36Z

The solution is OK. Just a reminder that these functions are on the critical path and the import statement inside them can slow down the performance.

Thanks for the reminder. Below is a simple test for the performance.

def test_import():
    import warnings
    warnings.filterwarnings("ignore")

    y = 3 + 5
    x = y + 2
    
    return x, y


def test_no_import():
    y = 3 + 5
    x = y + 2
    
    return x, y


if __name__ == "__main__":
    import timeit
    num_calls = int(1e6)
    print(timeit.timeit("test_import()", setup="from __main__ import test_import", number=num_calls))
    print(timeit.timeit("test_no_import()", setup="from __main__ import test_no_import", number=num_calls))

Results on my local machine:

0.6618426890054252
0.06726869300473481

Looks like the defect is acceptable.

sonichi · 2023-06-01T03:53:44Z

The solution is OK. Just a reminder that these functions are on the critical path and the import statement inside them can slow down the performance.

Thanks for the reminder. Below is a simple test for the performance.

def test_import():
    import warnings
    warnings.filterwarnings("ignore")

    y = 3 + 5
    x = y + 2
    
    return x, y


def test_no_import():
    y = 3 + 5
    x = y + 2
    
    return x, y


if __name__ == "__main__":
    import timeit
    num_calls = int(1e6)
    print(timeit.timeit("test_import()", setup="from __main__ import test_import", number=num_calls))
    print(timeit.timeit("test_no_import()", setup="from __main__ import test_no_import", number=num_calls))

Results on my local machine:

0.6618426890054252
0.06726869300473481

Looks like the defect is acceptable.

For non-spark case 0.5s overhead each call is considered as big because it happens many times in the inner loop, not just once.

thinkall · 2023-06-01T04:25:36Z

The solution is OK. Just a reminder that these functions are on the critical path and the import statement inside them can slow down the performance.

Thanks for the reminder. Below is a simple test for the performance.
def test_import():
    import warnings
    warnings.filterwarnings("ignore")

    y = 3 + 5
    x = y + 2
    
    return x, y


def test_no_import():
    y = 3 + 5
    x = y + 2
    
    return x, y


if __name__ == "__main__":
    import timeit
    num_calls = int(1e6)
    print(timeit.timeit("test_import()", setup="from __main__ import test_import", number=num_calls))
    print(timeit.timeit("test_no_import()", setup="from __main__ import test_no_import", number=num_calls))
Results on my local machine:
0.6618426890054252
0.06726869300473481
Looks like the defect is acceptable.
For non-spark case 0.5s overhead each call is considered as big because it happens many times in the inner loop, not just once.

The overhead is for 1e6 calls.

suppress warning message of pandas_on_spark to_spark

af50386

thinkall requested review from 0101, sonichi, levscaut and qingyun-wu and removed request for 0101 May 30, 2023 08:51

levscaut approved these changes May 30, 2023

View reviewed changes

sonichi reviewed May 30, 2023

View reviewed changes

qingyun-wu approved these changes Jun 1, 2023

View reviewed changes

Merge branch 'main' into suppress_warning

df0704c

thinkall requested a review from sonichi June 1, 2023 04:25

sonichi added this pull request to the merge queue Jun 1, 2023

Merged via the queue into microsoft:main with commit d36b2af Jun 1, 2023

thinkall deleted the suppress_warning branch June 1, 2023 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suppress warning message of pandas_on_spark to_spark #1058

suppress warning message of pandas_on_spark to_spark #1058

thinkall commented May 30, 2023

levscaut left a comment

sonichi left a comment

thinkall commented May 31, 2023 •

edited

Loading

sonichi commented Jun 1, 2023

thinkall commented Jun 1, 2023

suppress warning message of pandas_on_spark to_spark #1058

suppress warning message of pandas_on_spark to_spark #1058

Conversation

thinkall commented May 30, 2023

Why are these changes needed?

Related issue number

Checks

levscaut left a comment

Choose a reason for hiding this comment

sonichi left a comment

Choose a reason for hiding this comment

thinkall commented May 31, 2023 • edited Loading

sonichi commented Jun 1, 2023

thinkall commented Jun 1, 2023

thinkall commented May 31, 2023 •

edited

Loading